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Abstract. We examine node centrality measures based on the notion of total communicability, 
defined in terms of the row sums of matrix functions of the adjacency matrix of the network. Our 
main focus is on the matrix exponential and the resolvent, which have natural interpretations in 
terms of walks on the underlying graph. While such measures have been used before for ranking 
nodes in a network, we show that they can be computed very rapidly even in the case of large 
networks. Furthermore, we propose the (normalized) total sum of node communicabilities as a 
useful measure of network connectivity. Extensive numerical studies are conducted in order to 
compare this centrality measure with the closely related ones of subgraph centrality [E. Estrada 
and J. A. Rodriguez- Velazquez, Phys. Rev. E, 71 (2005), 056103] and Katz centrality [L. Katz, 
Psychometrica, 18 (1953), pp. 39-43]. Both synthetic and real- world networks are used in the 
computations. 
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1. Introduction. Over the past several years, the analysis of networks has be- 
come increasingly important in a number of disciplines [15l [171 ISH [23l |27l [40l |44j |47] . 

Network analysis is used in many situations: from determining network structure 
and communities, to describing the interactions between various elements of the net- 
work, to investigating the dynamics of phenomena taking place on the network (e.g., 
information flow). 

One of the fundamental questions in network analysis is to determine the "most 
important" elements in a given network. Measures of node importance are usually 
referred to as node centrality, and many centrality measures have been proposed, 
starting with the simplest of all, the node degree. This crude metric has the draw- 
back of being too "local", as it does not take into effect the connectivity of the 
immediate neighbors of the node under consideration. A number of more sophis- 
ticated centrality measures have been introduced that take into account the global 
connectivity properties of the network. These include various types of eigenvector cen- 
trality for both directed and undirected networks, betweenness centrality, and others 
which are discussed below. Overviews of various centrality measures can be found in 
[SI [ini [m mi [M1 HSl \M\ ■ The centraHty scores can be used to provide rankings of 
the nodes in the network. There are many different ranking methods in use (most of 
which depend on centrality measures), and many algorithms have been developed to 
compute these rankings. Information about the many different ranking schemes can 
be found, e.g., in|5l[2Tl[3a[3il39l[40l[4Tl[42]. 

One now standard method of measuring node importance is subgraph centrality 
[5S], which is based on the diagonal entries of a matrix function applied to the ad- 
jacency matrix A of the network in question. Here, the matrix exponential is 
frequently used. While this approach has been successfully used in a number of prob- 
lems [23 [13 [5H] , obtaining estimates of the diagonals of e^ for a large network with 
adjacency matrix A can be quite expensive. Indeed, computing individual entries 
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of matrix functions f{A) is generally costly for large A even with the best available 
algorithms [^177]. 

In recent years, efficient algorithms have been developed for computing the action 
of a matrix function on a vector, that is, for computing the vector /(v4)v for a given 
matrix A (usually large and sparse), vector v, and function /. A particularly impor- 
tant case is that of the matrix exponential, since this provides a solution method for 
initial value problems for first-order systems of linear ordinary differential equations. 
These algorithms, based on variants of the Lanczos, Arnoldi or other Krylov subspace 
method, access the matrix A only in the form of (sparse) matrix-vector products and 
have 0{n) storage cost for a sparse n x n matrix A [36l Chapter 13]. When v = 1, 
the vector with all its entries equal to 1, the zth entry of the resulting vector f{A)l 
contains the ith row sum of f{A): 

n 

[f{A)l]^^Y.^f{A)]^^, l<z<n. 
j=i 

This quantity, which has a graph-theoretic interpretation in terms of subgraph cen- 
trality and communicability [26[ I27j , can be computed much faster than subgraph 
centrality using current computational techniques. Of course, the same is true if the 
vector 1 is replaced by some other vector — typically, an "external importance vector" 
which can be used to take into account intrinsic, not network-related contributions to 
the centrality of each node |45l pp. 174-175]. 

Such centrality measures have long been in use in network analysis. Note that for 
the case of the "identity function" f{A) — A, and symmetric A (undirected networks), 
we recover degree centrality. The off-diagonal row sums of have been used in social 
network analysis to measure the resilience of an individual in the face of hostile attacks 
from within the network 21, Chapter 6]. More recently, row and colums sums of 
have been applied to the identification of hubs and authorities in directed networks 
[5]. For resolvent- type functions, such as f{A) — (I — aA)^^ (with / the nxn identity 
matrix), for suitable values of a > 0, we recover the well-known Katz centrality and its 
variants, also known as a-centrality; see, e.g., [37] and [101 HH [131 [33] . None of these 
previous studies, however, considered algorithmic aspects such as computational cost, 
storage, and so forth. 

This paper considers the implications of using the row sums of or similar ma- 
trix functions as a measure of node centrality, focusing for the sake of brevity on 
undirected networks. The interpretation of this measure in terms of total communi- 
cability of a node is given, and compared to the one for subgraph centrality in section 
[3l In section [4] the concept of total network communicability is introduced and dis- 
cussed. Section [5] contains experimental comparisons of subgraph centrality and total 
communicability using various synthetic and real-world networks. Sections [6] and [7] 
discuss computational aspects and the use of row sum centrality with other standard 
matrix functions, respectively. We offer some conclusive remarks in section [51 

2. Background and definitions. The analysis of networks requires the use of 
notions from graph theory, linear algebra, numerical analysis, and computer science. 
Here we list some basic definitions and ideas from graph theory. A more complete 
overview can be found in [19] . 

A graph G = {V,E) is a set of nodes (vertices) V with = rt and edges 
E = {{i,j)\i,j £ V}. A graph is undirected if the edges are unordered pairs of vertices 
and directed if the pairs are ordered (edges have a direction). The degree of a vertex 
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in an undirected graph is the number of edges which are adjacent to the node. In a 
directed graph, nodes have both an in-degree, the number of edges pointing into the 
node, and an out-degree, the number of edges starting at the node and pointing away. 
A simple graph is a graph with no loops (edges from node i to itself), no muhiple edges, 
and unweighted edges. In this paper, aU networks correspond to simple, undirected 
graphs unless otherwise specified. 

A walk of length A: on a graph G is a sequence of vertices vi,V2, ■ ■ ■ , Vk+i such that 
(vijVi^i) G E for all 1 < i < k. A path is a walk with no repeated vertices. A closed 
walk is a walk that starts and ends at the same vertex. A cycle is a closed walk with 
no repeated vertices. If any vertex in the graph is reachable from any other vertex, 
the graph is said to be connected. 

Every graph can be viewed as a matrix through the use of its adjacency matrix. 
The adjacency matrix of a network with graph G is given by 



The requirement of unweighted edges causes A to be binary and that of no loops in 
the graph forces A to have zeros along its diagonal. If the network is undirected, A 
will be symmetric but if the network is directed, A will generally be unsymmetric. 
In the case of an undirected network, the eigenvalues of A will be real. We label the 
eigenvalues of A in non- increasing order: Ai > A2 > . . . > A„. Note that the Perron- 
Frobenius theorem implies that Ai > A2 if the graph is connected (equivalently, if A 
is irreducible). 

3. Diagonal entries vs. row sums. In the authors introduce the con- 
cept of subgraph centrality as a centrality measurement for nodes in a network. This 
provides a ranking based on the diagonal entries of a matrix function applied to the 
adjacency matrix. Although there are various choices of function to use, the most 
common is the matrix exponential. The subgraph centrality of node i is given by [e^]ii 
where A is the adjacency matrix of the network. The subgraph communicability be- 
tween nodes i and j is given by [e'^]ij (note that in the case of an undirected network, A 
is symmetric and [e^]ij = [e^]ji). A node with a (relatively) large subgraph centrality 
is considered to be more important in the network and is given a higher ranking than 
nodes with lower subgraph centrality. A (relatively) large subgraph communicability 
between a pair of nodes i and j indicates that information flows more easily between 
those two nodes than between pairs of nodes with lower communicability. In other 
words, a low subgraph communicability indicates that the two nodes cannot easily 
exchange information. Network communicability can also be interpreted in terms of 
the correlations between different components of physical systems; see, e.g., [27] . 

The reasoning behind using the diagonal entries of as a measure of the cen- 
trality of a node in the network can be seen by considering the power series expansion 




1, if {i,j) is an edge in G, 
0, else. 



of e-^ |36J: 
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(3.1) 



It is well known in graph theory (and fairly easy to prove) that if A is the adjacency 
matrix of a network with unweighted edges, then [A'^J.y counts the number of walks 
of length k between nodes i and j. Thus, the subgraph centrality of node i, which 
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is equal to [e'^Jij, counts the number of closed walks centered at node i weighting a 
walk of length fc by a penalty factor of In this way, shorter walks are deemed 
more important than longer walks. Although some of these walks can be described as 
"illogical" (for example, the walk Vi — >■ Vj Vi ^ vj Vi is a closed walk of length 4 
centered at node i), the subgraph centrality of node i still gives us a measure of how 
close node i is to everything else in the network. 

By contrast, the row sum of for node i is given by X]J=i [^"^Vj > which counts all 
walks between node i and all the nodes in the network (node i included), weighting 
walks of length fc by a penalty factor of Thus, the ith row sum of e'^ can be 
interpreted as the total subgraph communicability of node i, and can be interpreted as 
a measure of the importance of the ith node in the network, since a node with high 
communicability with a large number of other nodes in the network is likely to be an 
important node, and certainly a more important node than one characterized by low 
total communicability. 

An immediate question is how this centrality measure compares with the subgraph 
centrality of node i in the network. In general, the rankings produced by the total 
communicability measure will not be the same as those produced by the subgraph 
centrality measure. The difference between the two rankings is 

n n 
j=l j^i j^i k=l 

where Vik is the ith element of the normalized eigenvector of A associated with the 
eigenvalue A^. Note that is always positive definite and that its diagonal entries 
are often large compared to the off-diagonals. If the diagonal entries of e"^ vary over 
a wide range while its off-diagonal sums remain confined within a more narrow range, 
the rankings produced by the two methods will not differ by much. However, this 
depends both on the spectrum of A and the entries of the eigenvectors. 

While it appears to be difficult, in general, to establish a relation between the 
rankings produced by the subgraph centrality and total communicability, for certain 
types of simple graphs it is easy to show that the two methods will give identical 
rankings. These include complete graphs and cycles (where each node has the exact 
same ranking under both systems), paths and star graphs. A star graph on n nodes 
has one central node that is connected to each of the n — 1 remaining nodes and no 
other edges. Under both ranking systems, the central node is ranked highest and the 
remaining nodes all have the same scores. This can be shown either using graph theory 
or by examining the eigenvalues and eigenvectors of the star graph (more information 
about the spectra of star graphs can be found in [2]). 

One case where the two measures could be expected to give similar rankings is 
that of networks with a large spectral gap, which for the purposes of this paper is the 
difference Ai — A2 between the first (largest) and second eigenvalue. We have: 

k=2 

and 

n 
k=2 
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Dividing both expressions by the constant e'^^ (which does not affect the ranfcings) 
and observing that vfl = ||vi||i shows that for Ai 3> A2 the two rankings are 
largely determined by the quantities and ||vi||iuii, respectively, and therefore by 
the entries vn of the dominant eigenvector of A. Thus, if the difference Ai — A2 is 
sufficiently large, the two centrality measures reduce to eigenvector centrality [lO] 
and therefore can be expected to result in very similar rankings, especially for the top 
nodes. Numerical experiments (not shown here) performed on Erdos-Renyi graphs 
with large spectral gaps have confirmed this fact. 

However, it is difficult to quantify a priori how large the spectral gap needs 
to be for all these rankings to be identical (or even approximately the same). In 
the section on computational experiments we will see that there can be significant 
differences between the rankings obtained using subgraph centrality and those using 
total communicability centrality, even for networks with a relatively large spectral 
gap. 

4. Total network communicability. The total communicabilities of individ- 
ual nodes give a measure of how well each node communicates with the other nodes 
of the network. In order to measure how effectively communication takes place across 
the network as a whole, we consider the sum of all the total communicabilities. For a 
network with adjacency matrix A, this is given by 

C{A) = J2 - EE^'^ (vf 1) V,. = l^e^l , (4.1) 

1=1 i=l k=l 

where, as in section [3l Afe is the fcth eigenvalue of A and Vik is the ith element of 
the normalized eigenvector associated with A^. Here we propose to use the total 
network communicability, C{A), as a global measure of the ease of sending information 
across a network. We emphasize that while C{A) is defined as the sum of all the entries 
of e^, it is not necessary to know any of the individual entries of e'^ to compute C{A); 
indeed, very efficient methods exist to compute quadratic forms of the type v'^ f{A)v 
for a given function f{x), matrix A and vector v, see [¥1 [SI 132] . 

It is instructive to compare the total communicability of a network with the 
Estrada index, an important graph invariant defined as the sum of all the subgraph 
centralities: 

n n 

i?i?(A)=^[e-^], = ^e^-=Tr(e^ 

i=l i=l 

The following proposition provides simple lower and upper bounds for C{A) in terms 
of EE (A) and other spectral quantities associated with the underlying network. 

Proposition 1. Let A be the adjacency matrix of a simple network on n vertices. 
Then, 

EE{A) < C{A) < ne"-^"^ , 

where \\A\\2 denotes the spectral norm of A. In particular, for an undirected network 
we have 

EE{A) < C{A) <ne^\ 



^By the Perron-Frobenius Theorem, the dominant eigenvector can be chosen to have nonnegative 
entries, and positive entries when the graph G is connected. 
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Proof. The lower bound is trivial, as 

EE{A) . [-\. ^ EE [^^].- = E [^^1]. = C[A). 

i—1 i—1 j — 1 i—1 

The upper bound follows from noticing that C{A) — I'^e^l = (e"^l)"^ 1 — {e^l, 1) 
and applying the Cauchy-Schwarz inequality to the quadratic form (e"^l, 1): 

|(e^l,l)|<||e^l|b||l||2<||e^||2||l||2||l||2<nell^lk 

For an undirected network A is symmetric and Ai — 1|^1|2- D 

Note that the lower bound is attained in the case of the "empty" graph with 
adjacency matrix A — 0, while the upper bound is attained on the complete graph, 
whose adjacency matrix is^ = ll"^ — /. 

The bounds from Proposition [T] also hold for e^^, (3 > 0. For any connected 
graph with adjacency matrix A, the bounds get tighter a.s (3 ^ 0+, since both the 
lower and upper bound tend to 1. The parameter /3 can be interpreted as an inverse 
temperature and is a reflection of external disturbances on the network (see, e.g., 
P7] for details); taking /? 0+ is equivalent to "raising the temperature" of the 
environment surrounding the network. 

When appropriately normalized, C{A) can be used to compare the ease of in- 
formation exchange on different networks. This could be useful, for instance, in the 
design of communication networks. In the following sections we compute the total 
communicability for various types of networks. The question arises of what would con- 
stitute a reasonable normalization factor. There are several possibilities. Normalizing 
C (A) by the number n of nodes corresponds to the average total communicability of 
the network per node. Similarly, normalizing C{A) by the number m of edges would 
correspond to the average total communicability of the network per edge. We note 
also that the minimum value of C{A) is n, corresponding to the empty graph on n 
nodes {V — (l>), while the maximum value is n^e"~^ — n, corresponding to the complete 
graph on n nodes. The expression 

takes its values in the interval [0, 1], with C{A) = for "empty" graphs (no commu- 
nication can take place on such graphs) and C{A) = 1 on complete graphs (for which 
the ease of communication between nodes is clearly maximum). Unfortunately, the 
denominator in this expression grows so fast that for most sparse graphs evaluating 
C{A) results in underflow. 

In the experiments below we chose to normalize C{A) by n, the number of nodes, 
and by m, the number of edges; for the network used in our tests we found that 
comparing networks based on C{A)/n or on C(A)/m yields exactly the same rankings, 
therefore we only include results for the former measure. 

5. Computational studies. In this section we carry out extensive centrality 
computations for a variety of networks, with the aim of comparing subgraph centrality 
with total communicability centrality. In particular, we are interested in determining 
if, or for what type of networks, the two centrality measures provide similar rankings. 
Moreover, for those networks where the two measures result in rankings that differ 
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significantly, we would like to obtain some insights on why this is the case. Of course 
it would be desirable to know when one measure should be preferred to the other, but 
this is a difhcult problem since it is not easy to come up with objective criteria for 
comparing ranking methods (see the discussion in |41[ Chapter 16]). We will compare 
the two methods in terms of computational cost in section [6] 

To measure similarities between the rankings obtained with the two methods 
we use (Pearson) correlation coefficients and the intersection distance method (see 
[30] as well as [9l [16]) on both the full set V of nodes and on partial lists of nodes. 
The correlation coefficients are computed using lists of nodes in rank order. The 
intersection distances are computed using the lists of subgraph centrality and total 
communicability values. Given two ranked lists x and y, the intersection distance 
between the two lists is computed in the following way: let Xk and yk be the top 
k ranked items in x and y respectively. Then the top k intersection distance (or 
intersection similarity) is given by 

k 

isimfe(a;, v) — — 

1=1 

where A is the symmetric difference operator between the two sets. If the lists are 
identical, then \s\nii^(x,y) = for all k. If the two sequences are disjoint, then 
isim/j = 1. We denote by cc the correlation coefficient between the two vector rankings, 
and by cCp the correlation coefficient between the top p% of nodes under the two 
ranking systems. We denote by isimp% the intersection distance between the top p% 
of nodes. 

Unless otherwise specified, all experiments were performed using Matlab version 
7.9.0 (R2009b) on a MacBook Pro running OS X Version 10.6.8, a 2.4 GHZ Intel Core 
15 processor and 4 GB of RAM. In this section, we use the Matlab built-in function 
expm for computing the matrix exponential. 

5.1. Test matrices. The synthetic examples used in the tests were produced 
using the CONTEST toolbox in Matlab [1S1[13. The graphs tested were of two types: 
preferential attachment (Barabasi-Albert) model and small world (Watts-Strogatz) 
model. In CONTEST, these graphs and the corresponding adjacency matrices can be 
built using the functions pref and smallw, respectively. 

The preferential attachment model was designed to produce networks with scale- 
free degree distributions as well as the small world property [3]. In CONTEST, pref- 
erential attachment networks are constructed using the command pref (n,d) where 
n is the number of nodes in the network and d > 1 is the number of edges each new 
node is given when it is first introduced to the network. The network is created by 
adding nodes one by one (each new node with d edges). The edges of the new node 
connect to nodes already in the network with a probability proportional to the degree 
of the already existing nodes. This results in a scale-free degree distribution. Note 
that with this construction, the minimum degree of the network is d. When d > 1 
this means that the network has no dangling nodes (nodes of degree 1), whereas in 
many real-life networks one often observes a high number of dangling nodes. In the 
CONTEST toolbox, the defauh value is d = 2. 

In our experiments, we tested various values of d on a network of size n = 1000: 
twenty networks were tested for all values 1 < d < 10, as well as all a few larger 
values. In Table the averages of the correlation coefficients between the subgraph 
centrality rankings and the total subgraph communicability rankings can be found for 
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Table 5.1: Comparison, using the correlation coefficient, of rankings based on the 
diagonal entries and row sums of for 1000-node scale-free networks of various pa- 
rameters built using the pref function in the CONTEST Matlab toolbox. The values 
reported are the averages over 20 matrices with the same parameters. The parameter 
d is the initial degree of nodes in the network (and consequently the minimum degree 
of the network) . 



d 


cc 


1 


0.224 


2 


0.343 


3 


0.517 


4 


0.905 


5 


0.993 


6 


0.999 


7 


0.999 


> 8 


1 



Table 5.2: Intersection distance comparisons of rankings based on the diagonal entries 
and row sums of for 1000-node scale-free networks of various parameters built 
using the pref function in the CONTEST Matlab toolbox. The values reported are 
the averages over 20 matrices with the same parameters. The parameter d is the 
initial degree of nodes in the network (and consequently the minimum degree of the 
network) . 



d 


isim 


isimio% 


1 


0.174 


0.199 


2 


0.036 


0.031 


3 


0.003 


0.005 


4 


2.04e-4 


2.79e-4 


5 


1.30e-5 


1.71e-5 


6 


9.83e-7 





7 


4.93e-7 





> 8 









various values of d. The intersection distance values can be found in Table 15.21 The 
intersection distance values were calculated both for the full set of rankings and for 
the top 10% of ranked nodes. 

The results show that correlation between the two metrics increases and the in- 
tersection distance value decreases quickly with the value of the parameter d. The 
intersection distance values for the top 10% of nodes are very close to those for the 
complete set of nodes. For sufficiently dense networks, the two measures provide essen- 
tially identical rankings, producing correlation coefficients close to 1 and intersection 
distances close to 0. 

A second class of synthetic test matrices used in our experiments corresponds to 
small- world networks (Watts-Strogatz model) . The small world model was developed 
as a way to impose a high clustering coefficient onto classical random graphs |50] . 
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Table 5.3: Comparison, using the correlation coefficient, of rankings based on the 
diagonal entries and row sums of for 1000-node small world networks of various 
parameters made using the smallw function in the CONTEST Matlab toolbox. The 
values reported are the average over 20 matrices with the same parameters. 



(a) 



(b) 



d 


cc 


1 


0.177 


2 


0.089 


3 


0.037 


4 


0.033 


5 


0.031 


6 


0.048 


7 


0.039 


8 


0.046 


9 


0.031 


10 


0.054 



d 


cc 


20 


0.156 


30 


0.222 


40 


0.240 


50 


0.310 


60 


0.426 


70 


0.431 


80 


0.747 


90 


0.926 


100 


0.997 


> 110 


1 



The name comes from the fact that, like classical random graphs, the Watts-Strogatz 
model produces networks with the small world (that is, small graph diameter) prop- 
erty. To build these matrices, the input is smallw (n,d,p) where n is the number of 
nodes in the network, which are arranged in a ring and connected to their d nearest 
neighbors on the ring. Then each node is considered independently and, with proba- 
bility p, a link is added between the node and one of the other nodes in the network, 
chosen uniformly at random. At the end of this process, all loops and repeated edges 
are removed. For this set of experiments, the size of the network was fixed at n — 1000 
and the probability of an extra link was left at the default value of p = 0.1 while d 
was varied. 

The values of d tested were: all values 1 < d < 10, along with all multiples 
of 10 up to 200. In each case, twenty networks were created with each value of d. 
The average correlation coefficients between the subgraph centrality rankings and the 
total communicability rankings are given in Table 15.31 As before, the correlation 
coefficients were computed between the complete sets of rankings. The intersection 
distances, reported in Table [531 were computed on both the complete sets of rankings 
and the top 10% of ranked nodes. 

It is evident from these results that for this class of small world networks, the 
similarity between the two ranking measures is much weaker than for the preferential 
attachment model, at least as long as the networks remain fairly sparse. The inter- 
section distances are also relativelt large, further indicating that the two measures are 
much more weakly related than in the case of the preferential attachment model. For 
some values of d, the intersection distance between the top 10% of nodes is above 0.7, 
indicating that there is little consistency among the rankings of the top 10% of nodes 
under the two measures. As the networks become increasingly dense, however, the 
correlation between the two measures becomes stronger and the intersection distance 
eventually decreases. 
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Table 5.4: Intersection distance comparison of rankings based on the diagonal entries 
and row sums of for 1000-node small world networks of various parameters made 
using the smallw function in the CONTEST Matlab toolbox. The values reported 
are the averages over 20 matrices with the same parameters. 



(a) 



(b) 



d 


isim 


isimio% 


1 


0.015 


0.071 


2 


0.056 


0.160 


3 


0.089 


0.252 


4 


0.117 


0.350 


5 


0.151 


0.479 


6 


0.178 


0.621 


7 


0.218 


0.709 


8 


0.243 


0.731 


9 


0.262 


0.705 


10 


0.284 


0.725 



d 


isim 


isimio% 


20 


0.311 


0.713 


30 


0.239 


0.535 


40 


0.133 


0.351 


50 


0.111 


0.214 


60 


0.039 


0.120 


70 


0.014 


0.041 


80 


0.002 


0.007 


90 


1.71e-4 


4.05e-4 


100 


5.88e-6 


1.09e-5 


> 110 









Table 5.5: Comparison of the total network communicability C{A) of a ring lattice 
and small world rings with increasing probability of a shortcut. The computed values 
were averaged over 20 instances. 



Graph 


number of edges 


C{A) 


normalized C{A) 


5000 node ring lattice 


5000 


3.69e04 


7.4 


smallw (5000, 1 , . 1) 


5492 


4.83e04 


9.7 


smallw (5000, 1 , .2) 


6222 


6.22e04 


12.4 


smallw (5000, 1 , .3) 


6495 


7.92e04 


15.8 


smallw (5000, 1 , .4) 


6990 


9.90e04 


19.8 


smallw (5000, 1 , .5) 


7496 


1.24e05 


24.8 


smallw(5000,l, .6) 


7999 


1.53e05 


30.6 



5.2. Total communicability in small world networks. For networks with 
low connectivity (or high locality) , the total network communicability can be expected 
to be low compared with networks with higher connectivity. For instance, on a 5000 
node ring lattice, the total network communicability is C{A) = 3.69e04 and the nor- 
malized C{A) is 7.4. However, when even a few shortcuts are added across the lattice 
using the Watts-Strogatz small world model, this value jumps considerably. If the 
probability of a shortcut is p = 0.1, the normalized total network communicability 
(averaged over 20 networks created using input smallw (5000 , 1 , p)) is 9.7. If the 
probability of a shortcut is increased to p = 0.2, the normalized total network com- 
municability increases to 12.4. These and additional results can be found in Table [5?5] 
and Fig. O 

5.3. Discussion of test results using synthetic data. The results reported 
so far can be explained as follows. In a (regular) ring-shaped network, no node is more 
central than the other nodes and no reasonable centrality measure would be able to 
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d 

Fig. 5.1: Plot of of the total network communicability C{A) for small world graphs 
with increasing probability d of a shortcut. The computed values were averaged over 
20 instances. 



assign a (strict) ranking of the nodes. In a small world network obtained by perturbing 
a regular ring-shaped network, all the nodes have approximately the same importance, 
with the nodes with extra links ( "shortcuts" ) being slightly more important than the 
others. When d is small, these shortcuts matter more, but the subgraph centrality 
scores and the total communicability scores do not have a large range. Due to this, the 
change in the scores due to moving from the subgraph centrality measure to the total 
communicability measure can have a high impact on node rankings. This leads to a 
low correlation and a relatively large intersection distance between the two rankings. 
When d gets very large, the shortcuts matter less and cause less perturbations between 
the two sets of rankings. By contrast, in a scale-free preferential attachment network 
both the subgraph centrality scores and total communicability scores are spread out 
over a large range, even for small d, and adding the corresponding off-diagonal row 
sums to the diagonal entries does not change the rankings as much. 

5.4. Real data. Next, we study correlations between the two ranking methods 
using various networks corresponding to real data. The networks in this section come 
from a variety of sources. The Zachary Karate Club network is a classic example 
in network analysis j51) . The Intravenous Drug User and Yeast PPI networks were 
provided to us by Prof. Ernesto Estrada. The Yeast PPI network has 440 ones on the 
diagonal due to the self-interactions of certain proteins. The remainder of the net- 
works can be found in the University of Florida Sparse Matrix Collection [18] under 
different "groups". The Erdos networks are from the Pajek group. They represent 
various subnetworks of the Erdos collaboration network. The ca-GrQc and ca-HepTh 
from the SNAP group are collaboration networks for the arXiv General Relativity and 
High Energy Physics Theory subsections, respectively. The as-735 network, also from 
the SNAP group, contains the communication network of a group of Autonomous 
Systems (AS) measured over 735 days between November 8, 1997 and January 2, 
2000. Communication occurs when routers from two Autonomous Systems exchange 
information. The Minnesota network from the Gleich group represents the Minnesota 
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Table 5.6: Comparison of rankings based on the diagonal and row sum of for 
various real-world networks. 



Graph 


n 


nnz 


Ai 


A2 


cc 


CCio 


CCi 


Zachary Karate Club 


34 


156 


6.726 


4.977 


0.420 




1 


Drug User 


616 


4024 


18.010 


14.234 


0.083 


0.976 


1 


Yeast PPI 


2224 


13218 


19.486 


16.134 


0.108 




1 


Pajek/Erdos971 


472 


2628 


16.710 


10.199 


0.523 


1 


1 


Pajek/Erdos972 


5488 


14170 


14.448 


11.886 


0.122 






Pajek/Erdos982 


5822 


14750 


14.819 


12.005 


0.128 






Pajek/Erdos992 


6100 


15030 


15.131 


12.092 


0.143 






SNAP/ca-GrQc 


5242 


28980 


45.617 


38.122 


0.021 




0.995 


SNAP/ca-HepTh 


9877 


51971 


31.035 


23.004 


0.007 






SNAP/as-735 


7716 


26467 


46.893 


27.823 


0.904 


0.771 


1 


Glcich/Minncsota 


2642 


6606 


3.2324 


3.2319 


0.087 







Table 5.7: Intersection distance comparison of rankings based on the diagonal and 
row sum of for various real-world networks. 



Graph 


isim 


isimio% 


isimi% 


Zachary Karate Club 


0.044 


0.111 





Drug User 


0.102 


0.002 





Yeast PPI 


0.025 


0.056 





Pajek/Erdos971 


0.004 








Pajek/Erdos972 


0.081 


0.075 


0.047 


Pajek/Erdos982 


0.079 


0.065 


0.044 


Pajek/Erdos992 


0.077 


0.055 


0.034 


SNAP/ca-GrQc 


0.043 


0.091 


5.49e-4 


SNAP/ca-HcpTh 


0.142 


0.319 


0.134 


SNAP/as-735 


1.81e-4 


0.001 





Glcich/Minncsota 


0.096 


0.341 


0.709 



road network. The order n and number of nonzeros nnz of the corresponding ad- 
jacency matrices are given in Table 15.61 These networks exhibit a wide variety of 
structural properties and together constitute a rather heterogeneous sample of real- 
world networks. All networks except the Yeast PPI network are simple and all are 
undirected. 

Table 15.61 reports the correlation coefficients between the two sets of rankings for 
all the nodes, the top 10% of the nodes and the top 1% of the nodes (limited to the 
cases where the two methods rank the same nodes in the top 10% and top 1%), as well 
as the value of the two largest eigenvalues Ai and A2 of the adjacency matrix. A "-" 
in the table signifies that different lists of top nodes where produced under the two 
rankings, hence correlation coefficients could not be computed in such cases. Table 
15.71 reports the intersection distances between the two sets of rankings for all, for the 
top 10%, and for the top 1% of the nodes. Table \WIE\ reports the normalized Estrada 
index and normalized total network connectivity for each of the networks. For the 
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Table 5.8: Comparison of the normalized Estrada index EE{A)/n, the normalized 
total network connectivity C{A)/n, and eH"^"^ {— e'^^) for various real- world networks. 



Graph 


normalized EE (A) 


normalized C{A) 




Zachary Karate Club 


30.62 


608.79 


833.81 


Drug User 


1.12e05 


1.15e07 


6.63e07 


Yeast PPI 


1.37e05 


3.97e07 


2.90e08 


Pajek/Erdos971 


3.84e04 


4.20e06 


1.81e07 


Pajek/Erdos972 


408.23 


1.53e05 


1.88e06 


Pajek/Erdos982 


538.58 


2.07e05 


2.73e06 


Pajek/Erdos992 


678.87 


2.50e05 


3.73e06 


SNAP/ca-GrQc 


1.24el6 


8.80el7 


6.47el9 


SNAP/ca-HepTh 


3.05e09 


1.06ell 


3.01el3 


SNAP/as-735 


3.00el6 


3.64el9 


2.32e20 


Gleich/Minnesota 


2.86 


14.13 


35.34 



Zachary Karate Club, which only has 34 nodes, cci — 1 and isimi% = indicate that 
the top two ranked nodes under the two rankings are the same. The top node is node 
34, which corresponds to the president of the karate club, and the second is node 1, 
which corresponds to the instructor. These were the two most influential members of 
the club and fought with each other to the point that eventually the club split into 
two factions aligned around each of them [STj . 

The results indicate that there is a good deal of variation between the correlation 
coefficients for these networks. The correlation coefficient between the rankings of 
all the nodes ranges from a low of 0.007 for the SNAP/ca-HepTh network to a high 
of 0.904 for the SNAP/as-735 network. Even for networks that come from similar 
datasets, the correlation coefficients can be very different. For example, the networks 
in the Pajek group are all subsets of the Erdos collaboration network, but correlations 
between the two sets of rankings range between 0.122 for the Erdos972 network and 
0.583 for the Erdos971 network. 

For most of the networks, the correlation coefficient (when defined) increases 
when only the top 1% of nodes are considered (cci), sometimes greatly. Five of 
the networks (Zachary Karate Club, Drug User, Yeast PPI, Pajek/Erdos971, and 
SNAP/as-735) produce the exact same rankings on the top 1% of nodes. Another 
network (SNAP/ca-GrQc) has a correlation coefficient greater than 0.9 on the top 1% 
of nodes. 

The intersection distance values behave in a similar way, although there is not as 
much variation in the values. Among all the nodes, the smallest intersection distance 
is 1.81e-4 for the as-735 network and the largest is 0.142 for the ca-HepTh network. 
These networks also had the largest and smallest correlation coefficients, respectively, 
for the full set of nodes. For 5 of the 11 networks examined, the intersection distance 
value decreases when only the top 10% of nodes are considered and for all cases 
except for the Minnesota road network, it decreases when only the top 1% of nodes 
are considered. 

It is interesting to note that the similarity between the two ranking methods is 
very different on the ca-GrQc and the ca-HepTh networks. The two networks are both 
arXiv collaboration networks from subsections of physics so, intuitively, one would 
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Fig. 5.2: The degree distributions of the ca-GrQc (left) and the ca-HepTh (right) 
collaboration networks. 



assume that they behaved similarly. However, the two rankings are very different 
on the ca-HepTh network and are highly correlated on the ca-GrQc network. The 
ca-GrQc network has a spectral gap of approximately 7.5 while the spectral gap of 
ca-HepTh is approximately 8, only slightly larger. The relative spectral gaps are also 
comparable. Thus, it is clear that the spectral gap alone cannot be used to differentiate 
between the two ranking methods. It appears that while the two networks are both 
physics collaboration networks, there are significant structural differences between 
the two groups which cause the two ranking systems to behave very differently. Some 
insight can be gleaned by looking at the degree distributions of the two networks. 
Although the ca-HepTh network is almost twice as large as the ca-GrQc, the maximum 
degree on the network is only 65 while the maximum degree on the ca-GrQc network 
is 81. See Fig. 15.21 for the degree distributions of the two networks. Additionally, the 
total communicability scores achieved by nodes in the ca-GrQc network range from 2.7 
to 8.5el9 (the subgraph centrality scores range from 1.5 to 1.6el8). In contrast, even 
with many more nodes, the total communicability scores of the ca-HepTh network 
have a smaller range, from 2.7 to 3.2el3 (the subgraph centrality scores range from 
1.5 to 9.7ell). It appears that the wider range of scores in the ca-GrQc network 
helps to prevent rankings from being changed when the scores are perturbed by the 
addition of off-diagonal communicabilities. This can be observed when looking at the 
intersection distances between the two sets of rankings on the networks, which are 
plotted in Fig. 15.31 Overall, the intersection distances are much lower for the ca-GrQc 
network than for the ca-HepTh network. Additionally, for k < 34, isimfc(ca-GrQc)= 0, 
indicating that the first 34 nodes are ranked exactly the same. In contrast, isimfc(ca- 
HepTh)= only for fc < 5, after which there is a large jump in the intersection 
distances. 

Similar behavior can be observed on the various instances of the Erdos collabo- 
ration network. Erdos971, which is very small, shows a high correlation between the 
two rankings; indeed, the rankings of the top 10% of nodes are exactly the same. On 
the other instances of the collaboration network, however, the rankings are somewhat 
different, as can be seen from the relatively low values of the correlation coeffcients. 
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Fig. 5.3: The intersection distance values (isinifc) of the ca-GrQc (left) and the ca- 
HepTh (right) collaboration networks. 



The intersection distance values, while not very high, are somewhat higher than for 
most other networks. The maximum subgraph centrality and total communicability 
scores of the Erdos972 network are the smallest of any of the Erdos collaboration 
subgraphs. The maximum subgraph centrality score is 1.18e05 and the maximum 
total centrality score is 9.20e06. By comparison, on the (much smaller) Erdos971 net- 
work, the maximum subgraph centrality score is l.lle06. On the Erdos982 network, 
the maximum subgraph centrality score is 1.71e05 and on the Erdos992 network it 
is 2.47e05. Although the top 5 nodes of the Erdos972 network are exactly the same 
under the two ranking schemes, the relatively narrow range of possible scores means 
that the addition of off-diagonal values to the diagonal ones perturbs the rankings 
of the other nodes so much as to result in a relatively high value of the intersection 
distance among the top 1% of nodes. 

As before, the spectral gap for these networks does not give much insight into 
the behavior of the two ranking schemes, unless it is really large; the largest spectral 
gap for this set of test problems occur for SNAP/as-735, and indeed here we observe 
a strong correlation and a small intersection distance between the two metrics. Con- 
versely, for the (planar, fairly regular) Gleich/Minnesota network, the spectral gap is 
smallest and not surprisingly the correlation is very weak and the intersection distance 
for the top 1% of the nodes, isimi%, is very high at 0.709. 

When examining the (normalized) total network connectivities of the various 
networks (see Table [575| . it can be seen that the ease of information sharing across the 
networks varies widely. Some networks, such as the collaboration networks ca-HepTh 
and ca-GrQc, have a high normalized C{A) (8.80el7 and 1.06ell, respectively). The 
value is even higher for the SNAP/as-735 router network {C{A)/n =3.64el9). The 
Minnesota road network, on the other hand, has a normalized C{A) of only 14.13, 
indicating that the network is relatively poorly connected, as one would expect in a 
graph characterized by wide diameter, small bandwidth and high locality. 

5.5. Identification of essential proteins in PPI network of yeast. One 

important application of node centrality measures is to rank nodes in protein-protein 
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Fig. 5.4: The intersection distance values (isinife) of the Yeast PPl network. 



interaction networks (PPIs) in an attempt to determine which proteins are essential, 
in the sense that their removal would result in the death of the cell. The goal of 
such rankings is for as many of the top-ranked nodes as possible to correspond to 
essential proteins. In |24) . various centrality measures were tested on their ability 
to identify essential proteins in the Yeast PPI network. It was shown that, among 
the centrality measures tested, subgraph centrality identified the highest percentage 
of essential proteins ranked in the top 30 nodes, identifying 18 essential proteins (in 
[23], subgraph centrality was said to identify 19 essential proteins, but this was later 
corrected [22]). When total communicability is used instead, the top 30 nodes are 
the same, so the same percentage of essential proteins are identified. The intersection 
distances between the two sets of rankings are displayed in Fig. 15.41 Here, it can be 
seen that the intersection distances are small for approximately the top 50 nodes, then 
they begin to rise. The two rankings are least similar for nodes ranked 200-500, then 
their similarity increases again. As already noted, total communicability rankings can 
be calculated much more quickly than subgraph centrality rankings (see also section 
[S]). Although there are currently methodologies which do better in protein ranking 
(see [25) for example), our findings suggest that total communicability does provide 
valuable information about the relative importance of nodes in the network. 

5.6. Further discussion of test results using real networks. The results 
just described indicate that in general the two centrality measures can produce sig- 
nificantly different rankings, even when one restricts the attention to the top 1% of 
nodes, and even for networks belonging to the same "family" . As in the case of syn- 
thetic networks, a wider range of values in the two sets of centralities leads to stronger 
correlations between the corresponding rankings than in the case of a narrow range. 

Two extreme cases are represented by the SNAP/as-735 and Gleich/Minnesota 
data sets. The first one exhibits a large value of the spectral gap, and thus (as 
expected) a strong correlation between the two rankings; the second one has tiny 
spectral gap and results in very weakly correlated rankings. For networks that fall 
somewhere in between these two extremes, the observed correlation coefhcients can 
vary significantly. The subgraph centrality scores measure how "well-connected" a 
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node is in the network as a whole while the communicability score between nodes i 
and j measures how well information travels between node i and node j. Thus, the 
total communicability of node i is a measure of how well information travels between 
node i and any node in the network (node i itself included). Although these two 
measures are closely related, they are not quite the same. This observation suggests 
that the two centrality measures reflect somewhat different structural properties of 
the networks. Thus, they should be applied in concert rather than in alternative of 
one another, unless computational considerations dictate otherwise. 

6. Computational aspects. There are various methods available to compute 
(or approximate) the matrix exponential. One of the most used schemes (which 
is implemented in Matlab as the expm function) is based on Pade approximations 
combined with scaling and squaring |35l 136) . For a generic n x n matrix, this requires 
0{n^) arithmetic operations and O(n^) storage. The prefactor multiplying in 
the arithmetic complexity can vary widely depending on the sparsity and structural 
properties of A. 

Once the matrix exponential is computed, both the subgraph centrality and the 
total communicability rankings are readily obtained. However, to compute the sub- 
graph centrality rankings, we do not need the complete matrix exponential, we only 
need the diagonal entries of e^. Methods for efficiently estimating individual entries 
of matrix functions have been developed by Golub, Meurant, and others [3H [B] and 
these methods have previously been applied to network analysis [H [5] . They are based 
on Gaussian quadrature and the Lanczos algorithm, and they have been implemented 
in the Matlab toolbox mmq [43 . The cost per node of estimating the subgraph cen- 
trality is typically 0{n), giving a total cost of approximately 0{n'^) for estimating 
the subgraph centrality for every node and computing the subgraph centrality rank- 
ings. However, the coefficient of the 0{n) estimate can be quite large. Additionally, 
the mmq toolbox-based implementation for calculating subgraph centrality that we use 
here has not been optimized, unlike the built-in Matlab function expm. We mention 
in passing that methods for quickly determining the top k nodes and only calculating 
the exact rankings on this subset have also been developed [SJ [3T] . 

The individual entries of the matrix exponential are not necessary for computing 
the total communicability rankings; only the row sums of are necessary. An efhcient 
algorithm for evaluating /(yl)v using a restarted Krylov method has recently been 
presented in [I] I20j . In this approach, the basic operation is represented by matrix- 
vector products with A. This method has been implemented in the Matlab toolbox 
funmJjryl by Stefan Giittel |34j. We apply this algorithm with f{A) — and v = 1. 
Clearly, the same algorithm can be used to rapidly compute C{A) — l^e^l. For many 
network of practical interest, the cost is typically 0(n), although the prefactor can 
vary considerably for different types of networks. 

Table 16.11 lists the timings for calculating the matrix exponential directly using 
expm, estimating the subgraph centralities using the mmq toolbox (with 5 iterations of 
the Lanczos algorithm per node) , and estimating the total communicabilities using the 
f unmJiryl toolbox to estimate e^l (using a very stringent stopping tolerance of le- 
16). These computations have been performed using Matlab Version 7.9.0 (R2009b) 
on a 2.4 GHZ Intel Core 15 processor with 4 GB of RAM. In general, the timings 
with expm increase for increasing number of nodes, but structural properties of the 
underlying graph, like the network diameter, can have a very significant impact on 
the computing times. For example, the yeast PPI network and the Minnesota road 
network have approximately the same number n of nodes (2224 and 2642, respec- 



18 



M. Benzi and C. Klymko 



Table 6.1: Timings (in seconds) to compute centrality rankings based on the diagonal 
and row sum of for various test problems using different methods. 



Graph 


expm 


mmq 


f unm_kryl 


Zachary Karate Club 


0.062 


0.138 


0.120 


Drug User 


0.746 


2.416 


0.363 


Yeast PPI 


47.794 


9.341 


0.402 


Pajek/Erdos971 


0.542 


2.447 


0.317 


Pajek/Erdos972 


579.214 


35.674 


0.410 


Pajck/Erdos982 


612.920 


39.242 


0.393 


Pajck/Erdos992 


656.270 


53.019 


0.325 


SNAP/ca-GrQc 


281.814 


23.603 


0.465 


SNAP/ca-HepTh 


2710.802 


58.377 


0.435 


SNAP/as-735 


2041.439 


75.619 


0.498 


Gleich/Minnesota 


1.956 


10.955 


0.329 



tively), yet computing the matrix exponential for the yeast network takes almost 25 
times longer than for the Minnesota road network. This appears to be due to the fact 
that the yeast network has a much smaller diameter than the Minnesota network, 
therefore the powers A'' of the adjacency matrix fill up much more quickly. Since the 
algorithm implemented in expm involves solving linear systems with polynomials in 
A as coefficient matrices, the execution time for sparse matrices with small diameter 
tends to be much higher than for matrices exhibiting a high degree of locality. 

For the majority of the networks tested, using the mmq toolbox to estimate sub- 
graph centrality was faster than using expm, frequently by far. The exceptions 
(Zachary Karate Club, Drug User, Erdos971, and Minnesota) were the networks with 
a small number of nodes and/or a high diameter. 

The computation of the total communicabilities using f unm_kryl was by far the 
fastest method for all networks tested, with the only exception of the tiny Zachary 
Karate Club network. In principle, this is a clear advantage of total communicability 
over subgraph centrality. However, as we saw, the two methods often result in rather 
different rankings, therefore we cannot simply replace subgraph centrality with total 
communicability. 

6.1. A large-scale example. In addition to the test results discussed above, 
we performed tests with the digraph of Wikipedia (as of June 6, 2011), where nodes 
correspond to entries and directed links to hyperlinks from one entry to another. In 
this case, the entries of c^l provide a ranking of the hubs in the networks, see [S]. 
This graph contains 4,189,503 nodes and 67,197,636 links, and it is prohibitively large 
for centrality measures based on estimating the diagonals of the matrix exponential. 
For this reason, we limit ourselves to computations using the f unmJsryl toolbox to 
estimate the row sum vector e^l. The restart parameter was set to 10 and we allowed 
a maximum of 50 restarts. The run time to obtain the rankings on a parallel system 
comprising 24 Intel(R) Xeon(R) E5-2630 2.30GHz CPU(s) was 216.7 seconds. This 
shows that centrality calculations using total communicability are quite feasible even 
for large networks. 

7. Resolvent-based centrality measures. There are matrix functions other 
than the matrix exponential that may be used to calculate subgraph centrality and 
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subgraph communicability scores. The most common of these is the matrix resolvent 

oo 

(/ - aA)-^ =I + aA + c? + • • ■ + A^ + ' ' ' = "'"^'^ ' C^'^) 

where < a < -j;^, with p{A) the spectral radius of A. This was first used by Katz 
[57] in the 1950s and has been used in various forms since then [TUl [T^ ITil [?n [Ml 
HU] . The bounds on a ensure that / — a A is invertible and that the geometric series 
converges to the inverse. Additionally, the inverse is nonnegative; indeed, / — aA is 
a nonsingular Af -matrix. Note that if A is the adjacency matrix of an undirected 
network, pi^A) = Amax(^) — ||^||2. Since the spectral radius of a nonnegative matrix 
always satisfies p(j4) > min^ ^^i' it follows that for a connected undirected graph 

a must be less than 1. 

Like the matrix exponential, [(/ — OiA)^^\ii counts the number of closed walks 
centered at node i and ^YTj^xSJ ~ counts all walks between node i and all 

other nodes in the network. In this case, however, a walk of length fc is penalized 
by a factor of a*''. One drawback of the use of the matrix resolvent in determining 
centrality rankings is the need to choose the value of a; also, different values of a 
can lead to different rankings. For the purposes of the experiments below, we select 
a = ^ ^'^(A) (similar to the choice of parameter in PageRank [40]). 

Resolvent-based total network communicability can also be evaluated. As when 
using the matrix exponential (cf. section^]) , the resolvent-based total network commu- 
nicability is an upper bound for the resolvent-based Estrada index. In the following, 
Cr{A) = X]r=i Si=i [(-^ ~ denotes the resolvent-based total communicabil- 

ity of a network. The following Proposition can be easily proved along the same lines 
as Proposition [TJ 

Proposition 2. Let A be the adjacency matrix of a simple, undirected network 
on n vertices. Then for any < a < \\a\\2 ' 

EEAA) := Tr [(/ ~ aAy^] < Cr{A) < Y^^p]^- 

For an undirected network, Aniax(^) — Ai can replace \\A\\2 in the upper bound above. 

The resolvent-based subgraph centrality and total communicability rankings were 
compared on the same two sets of synthetic networks used for the tests in section [5Tl 

Table 17.11 lists the average correlation coefficient between the subgraph centrality 
and total communicability rankings for the nodes in networks constructed using the 
preferential attachment model (function pref in CONTEST) and Table 17.21 lists the 
intersection distances for all the nodes and for the top 10% of the nodes. For small 
values of d (1 < d < 3), the correlation coefficients between the two sets of rankings 
using the matrix resolvent are close to those using the matrix exponential. However, 
when using the matrix exponential the average correlation coefficient was found to be 
greater than 0.9 for all d > 4, and exactly 1 for all d > 8. Using the matrix resolvent 
the correlation coefficient grows as d increases, but somewhat more slowly than for the 
matrix exponential. The intersection distances are also larger for all values of d when 
the matrix resolvent is used, although they also decrease as d increases. Moreover, 
we did not find a single instance where the two methods produced exactly the same 
rankings. 

For the small world networks, all values 1 < d < 10 as well as as all multiples 
of 10 with 20 < 10 < 200 were tested. For each d, twenty networks were tested. 
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Table 7.1: Comparison using correlation coefficients ol rankings based on the diagonal 
entries and row sums of (/ — aA)^^ for 1000-node scale-free networks of various 
parameters built using the pref function in the CONTEST Matlab toolbox. For 
each instance, the results are measured for a = ^ '^'^(a) ■ values reported are the 
averages over 20 matrices with the same parameters. 



(a) 



(b) 



(c) 



d 


cc 


1 


0.292 


2 


0.370 


3 


0.442 


4 


0.486 


5 


0.536 


6 


0.583 


7 


0.607 


8 


0.638 


9 


0.667 



d 


cc 


10 


0.691 


20 


0.840 


30 


0.890 


40 


0.917 


50 


0.933 


60 


0.942 


70 


0.949 


80 


0.954 


90 


0.958 


100 


0.962 



d 


cc 


110 


0.964 


120 


0.965 


130 


0.968 


140 


0.970 


150 


0.971 


160 


0.973 


170 


0.973 


180 


0.975 


190 


0.976 


200 


0.976 



Table 7.2: Intersection distance comparison of rankings based on the diagonal entries 
and row sums of (/ — aA)~^ for 1000-node scale-free networks of various parameters 
built using the pref function in the CONTEST Matlab toolbox. For each instance, 
the results are measured for a = ^ ^^'^^a) • "^^^ values reported are the averages over 
20 matrices with the same parameters. 



(a) 



(b) 



(c) 



d 


isim 


isimio% 


1 


0.186 


0.491 


2 


0.205 


0.364 


3 


0.192 


0.235 


4 


0.179 


0.173 


5 


0.163 


0.126 


6 


0.150 


0.102 


7 


0.137 


0.082 


8 


0.124 


0.068 


9 


0.115 


0.059 



d 


isim 


isimio% 


10 


0.105 


0.051 


20 


0.055 


0.020 


30 


0.035 


0.012 


40 


0.025 


0.007 


50 


0.019 


0.005 


60 


0.015 


0.004 


70 


0.012 


0.003 


80 


0.010 


0.002 


90 


0.009 


0.002 


100 


0.007 


0.001 



d 


isim 


isimio% 


110 


0.006 


0.001 


120 


0.005 


7.12e-4 


130 


0.005 


6.98C-4 


140 


0.004 


5.74e-4 


150 


0.004 


5.62e-4 


160 


0.003 


3.69e-4 


170 


0.003 


4.25e-4 


180 


0.003 


3.11e-4 


190 


0.003 


3.16e-4 


200 


0.002 


4.00C-4 



The averages of the correlation coefficients between the subgraph centrality and total 
communicability rankings can be found in Table 17.31 and the average intersection 
distances for both all the nodes and the top 10% of the nodes can be found in Table 
17.41 As was the case for the matrix exponential, the two methods (diagonal entries and 
row sums) using the matrix resolvent exhibit much weaker correlations for this class 
of networks than for the preferential attachment networks; indeed, the correlations 
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Table 7.3: Comparison using correlation coefficients of rankings based on the diagonal 
entries and row sums of (/ — aA)^^ for 1000- node small world networks of various 
parameters built using the smallw function with p = Q.l in the CONTEST Matlab 
toolbox. For each instance, the results are measured for a = ^ °'^^A) • "^^^ values 
reported are the averages over 20 matrices with the same parameters. 



(a) 



(b) 



d 


cc 


1 


0.065 


2 


0.023 


3 


0.052 


4 


0.052 


5 


0.052 


6 


0.051 


7 


0.062 


8 


0.037 


9 


0.050 



d 


cc 


10 


0.063 


20 


0.078 


30 


0.080 


40 


0.135 


50 


0.144 


60 


0.141 


70 


0.144 


80 


0.133 


90 


0.248 


100 


0.190 



d 


cc 


110 


0.294 


120 


0.246 


130 


0.275 


140 


0.311 


150 


0.312 


160 


0.321 


170 


0.301 


180 


0.293 


190 


0.354 


200 


0.300 



Table 7.4: Intersection distance comparison of rankings based on the diagonal entries 
and row sums of (I ~ aA)^^ for 1000-node small world networks of various parameters 
buih using the smallw function with p = 0.1 in the CONTEST Matlab toolbox. For 
each instance, the results are measured for a — ^ • "^^^ values reported are the 
averages over 20 matrices with the same parameters. 



(a) 



(b) 



(c) 



d 


isim 


isimio% 


1 


0.040 


0.149 


2 


0.070 


0.189 


3 


0.085 


0.241 


4 


0.091 


0.269 


5 


0.098 


0.301 


6 


0.104 


0.318 


7 


0.126 


0.361 


8 


0.135 


0.414 


9 


0.149 


0.413 



d 


isim 


isimio% 


10 


0.156 


0.435 


20 


0.207 


0.508 


30 


0.198 


0.517 


40 


0.204 


0.571 


50 


0.207 


0.621 


60 


0.191 


0.588 


70 


0.181 


0.582 


80 


0.189 


0.607 


90 


0.156 


0.597 


100 


0.179 


0.585 



d 


isim 


isimio% 


110 


0.147 


0.541 


120 


0.148 


0.553 


130 


0.160 


0.554 


140 


0.142 


0.560 


150 


0.123 


0.542 


160 


0.121 


0.539 


170 


0.124 


0.517 


180 


0.125 


0.512 


190 


0.114 


0.504 


200 


0.123 


0.504 



tend to be even smaller for the resolvent than for the exponential. For d — 1, the 
average correlation is 0.065 and the average intersection distance was 0.040 using the 
resolvent, compared to a correlation of 0.177 and an intersection distance of 0.015 
using the exponential. For the values of d tested, the highest average correlation 
coefficient was 0.354, for d = 190. When looking at the intersection distances for 
other values of d, the picture is somewhat different. Comparing Table VTM with Table 
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Table 7.5: Comparison using correlation coefficients of rankings based on the diagonal 
entries and row sums of (/ — aA)^^ with a = ^ "'^f^-i for various real- world networks. 



Graph 


cc 


CClQ 


CCi 


Zachary Karate Club 


0.589 


1 


1 


Drug User 


0.189 






Yeast PPI 


0.177 






Pajek/Erdos971 


0.233 




1 


Pajek/Erdos972 


0.215 






Pajek/Erdos982 


0.207 






Pajek/Erdos992 


0.197 






SNAP/ca-GrQc 


0.070 






SNAP/ca-HepTh 


0.072 






SNAP/as-735 


0.204 






Gleich/Minnesota 


0.019 







Table 7.6: Intersection distance comparison of rankings based on the diagonal entries 
and row sums of (/ — aA)"^ with a = ^ '^'^(a) various real-world networks. 



Graph 


isim 


isimio% 


isimi% 


Zachary Karate Club 


0.061 








Drug User 


0.125 


0.145 


0.069 


Yeast PPI 


0.204 


0.363 


0.187 


Pajek/Erdos971 


0.080 


0.050 





Pajek/Erdos972 


0.110 


0.273 


0.263 


Pajek/Erdos982 


0.109 


0.269 


0.264 


Pajek/Erdos992 


0.109 


0.271 


0.247 


SNAP/ca-GrQc 


0.047 


0.122 


0.033 


SNAP/ca-HepTh 


0.058 


0.159 


0.236 


SNAP/as-735 


0.247 


0.513 


0.271 


Gleich/Minnesota 


0.102 


0.301 


0.557 



15.41 we see that for small d the intersection distance between the two ranking schemes 
tends to be somewhat higher with the matrix exponential than with the resolvent. 
However, as d increases the intersection distance eventually drops with the matrix 
exponential, but not with the resolvent. This is true both when looking at the ranking 
of all the nodes and when looking at only the top 10%. 

Next, we consider tests with real-world networks. As shown in Table 17.51 the 
correlation coefficients between the two ranking systems for the whole set of nodes 
were higher (in a majorityy of cases) using the matrix resolvent than they were using 
the matrix exponential. (Again, a "-" signifies that correlation coefficients could not 
be computed due to the fact that the two ranking schemes produced different lists 
of nodes.) Only the Erdos971, as-735, and the Minnesota networks had a higher 
correlation coefficient between the two ranking systems under the exponential than 
under the matrix resolvent. This can be understood when looking at the normalized 
Estrada indexes and total network communicabilities in Table 17.71 The smaller the 
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Tabic 7.7: Comparison of the normalized resolvent-based Estrada index EEr{A)/n 
and total network connectivity Cr{A)/n for various real-world networks. Here, f{A) = 
[I-aA)-^ with a 



Graph 


normalized EEr{A) 


normalized Cr{A) 


Zachary Karate Club 


1.21 


5.13 


Drug User 


1.03 


2.36 


Yeast PPI 


1.03 


2.17 


Pajek/Erdos971 


1.03 


2.44 


Pajek/Erdos972 


1.01 


1.70 


Pajek/Erdos982 


1.01 


1.66 


Pajek/Erdos992 


1.01 


1.60 


SNAP/ca-GrQc 


1.00 


1.21 


SNAP/ca-HcpTh 


1.01 


1.24 


SNAP/as-735 


1.00 


1.86 


Gleich/Minnesota 


1.27 


3.44 



factor a, the more it minimizes the contribution of the network data from A to 
the scores produced by the diagonal entries or row sums of (/ — aA)~^ . This can 
also be seen by noticing that as a — >■ 0, (/ — aA)~^ approaches the identity. In these 
experiments, a = ^ ^'^\a) ■ However, this also means that for the networks tested with 
a large maximiim eigenvalue (such and ca-GrQc, ca-HepTh, and as-735) a is quite 
small, causing the resulting subgraph centrality scores to be small and, consequently, 
close together. In the case of a network with a small maximum eigenvalue (such as 
the Minnesota network), the effect of a is not as pronounced. The compression of the 
score values means that a perturbation of the scores (such as occurs when switching 
from subgraph centrality scores to total communicability scores) has a large effect on 
the node rankings, especially for the higher ranked nodes. 

When only the top 1% of nodes were considered, the exponential subgraph cen- 
trality and exponential total communicability rankings were much closer together 
than their resolvent counterparts, where often the top 1% of nodes were not even the 
same. This seems to indicate that when using the resolvent, the subgraph centrality 
and total communicability tend to rank the less important nodes more similarly than 
they do under the matrix exponential. Under the matrix exponential, the two rank- 
ings seem to agree more closely on the important nodes than they do when using the 
resolvent. This can also be seen when looking at the intersection distance, which gives 
more weight to differences in the top ranked nodes than in the lower ranked nodes. 
For all networks except ca-HepTh, the intersection distance between the two rankings 
is smaller when using the exponential than when using the resolvent. When looking at 
the top 1% of nodes, the intersection distances are also smaller (often much smaller) 
in the case of the exponential, for all except three of the networks. The exceptions 
are the Minnesota road network (which has a large intersection distance on the top 
1% of nodes for both the exponential and the resolvent) and the Zachary Karate Club 
and Erdos971 networks (which have isimi% = for both cases). 

Another observation that can be made is that the resolvent-based total network 
communicability Cr{-A) is unable to discriminate between highly connected networks 
and poorly connected ones, in stark contrast with the exponential-based one. For 
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instance, in the case of the Minnesota road network a is relatively large (since Ai is 
small for this graph), hence the off-diagonal contributions to Cr{A) are more signifi- 
cant than for other networks where Ai is large (thus forcing a small value of a, leading 
to a resolvent very close to the identity matrix). Thus, only the exponential-based 
total network communicability should be used when comparing different networks in 
terms of ease of communication. 

When the identification of essential proteins in the Yeast PPI network is con- 
sidered using resolvent-based total communicability, the results are comparable to 
those using the exponential. The resolvent-baed total communicability rankings with 
a = ^ °'^('^) identified 17 essential proteins in the top 30 (as compared to 18 identified 
by exponential subgraph centrality and total communicability). The resolvent-based 
subgraph centrality, however, identified 19 essential proteins in the top 30, slightly 
outperforming the other methods. 

Concerning the computational complexity, when dealing with large networks the 
use of the conjugate gradient method (possibly with some type of preconditioning) to 
solve the linear system (/ — aA)x. = 1 is orders of magnitude faster than trying to 
estimate the diagonal entries of (/ — aA)"-^. For certain networks, Chebyshev semi- 
iteration can be even faster [7 . Thus, as was the case for the matrix exponential, 
rankings based on total communicability (row sums) are a lot cheaper than the rank- 
ings based on subgraph centrality (diagonals). Once again, however, the two ranking 
methods in general produce different rankings, so one should not choose between the 
two based solely on computational cost. 

8. Conclusions. We have examined the use of total communicability as a method 
for ranking the importance of nodes in a network. Like the subgraph centrality rank- 
ing, the total communicability ranking using the matrix exponential counts the num- 
ber of walks starting at a given node, weighing walks of length fc by a penalization 
factor of ^. However, instead of only counting closed walks, it counts all walks be- 
tween the given node and every node in the network. If the matrix resolvent is used, 
the weight on the walks becomes a'' for a chosen parameter a in a certain range. There 
are various classes of graphs on which it can be shown that the two exponential-based 
rankings are always identical or in very good agreement; for instance, certain types 
of simple regular graphs and Erdos-Renyi random graphs with large spectral gap. 
However, as is well known, these classes are not realistic models of real-world complex 
networks. 

The two sets of rankings (total communicability and subgraph centrality) have 
been used to rank the nodes of networks corresponding to both real and synthetic 
data sets. The synthetic data sets were constructed using the preferential attachment 
(Barabasi- Albert) and the small world (Watts-Strogatz) models, corresponding to the 
functions pref and smallw of the CONTEST toolbox for Matlab. Good agreement 
between the two ranking methods was observed on the networks obtain with the 
preferential attachment method, especially as the density of the graphs increased. 
More pronounced differences between the rankings produced with the two methods 
were observed in the case of small world networks. Overall, the two importance 
rankings matched more closely when the matrix exponential was used than when 
under the matrix resolvent. 

We also presented the results of experiments with real-world networks including 
social networks, citation networks, PPI networks, and infrastructure (transportation) 
networks. Here we found that overall, the two (complete) sets of rankings were closer 
to each other when the matrix resolvent was used instead of the matrix exponen- 
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tial. However, when only the top 1% of nodes was examined, the rankings matched 
more closely when the matrix exponential was used. This suggests that, for the 
networks tested, the resolvent-based rankings match more closely on "unimportant" 
(low-ranked) nodes and the exponential-based rankings exhibit more agreement on 
the "important" (top-ranked) nodes. 

In general, there is no simple way to compare two ranking schemes and determine 
that one is "better" than the other. However, the total communicability rankings take 
into account more of the network topology than the subgraph centrality rankings (all 
walks starting at node i versus all closed walks starting at node i). This added infor- 
mation often (but not always) changes the ranking of the nodes to a certain degree, 
although there are many cases where there is still a strong similarity between the two 
sets of rankings. The main benefit of using total communicability to rank the nodes is 
that the ranking can be estimated extremely quickly using Krylov subspace methods. 
Indeed, as the Wikipedia graph calculation described in section Wl] shows, for very 
large networks only the total communicability (row sum) method is computationally 
feasible, the subgraph centrality ranking being prohibitively expensive to compute. 
Even if total communicability cannot always be recommended as a cheaper alternative 
to subgraph centrality, it provides valuable information about the network and can 
be used along with other ranking schemes. 

Finally, we have introduced the total communicability of a network as a global 
measure of connectivity and of the ease of information flow on a given network. This 
measure can be computed quickly even for very large networks, and could be of interest 
in the design of communication networks. 

Acknowledgments. We are indebted to Prof. Ernesto Estrada (University of 
Strathclyde) for providing the Intravenous Drug User and Yeast PPI network data, 
and to Mr. Yu Wang (Emory University) for performing the calculations with the 
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